How to use GenerativeProteomics ================================= If your main goal is simply to just impute a general dataset, the most straightforward and simplest way to use GenerativeProteomics is to run: .. code-block:: bash python generativeproteomics.py -i /path/to/file_to_impute.csv By running it in this manner, it will result in two separate training phases. 1. **Evaluation run**: In this run a percentage of the values (10% by default) are concealed during the training phase and then the dataset is imputed. The RMSE (Root Mean Square-Error) is calculated with those hidden values as targets and at the end of the training phase a **test_imputed.csv** file will be created containing the original hidden values and the resulting imputation. This way you can have an estimation of the imputation accuracy. 2. **Imputation run**: Afterwards, a proper training phase takes place using the entire dataset. An **imputed.csv** file will be created containing the imputed dataset. However, there might be a few arguments which you may want to change. You can do this using a **parameters.json** file (you may find an example in ``GenerativeProteomics/breast/parameters.json``) or you can choose them directly in the command line. Run with a parameters.json file: .. code-block:: bash python generativeproteomics.py --parameters /path/to/parameters.json Run with command line arguments: .. code-block:: bash python generativeproteomics.py -i /path/to/file_to_impute.csv -o imputed_name --ofolder ./results/ --it 2001 Arguments: - **-i**: Path to file to impute - **-o**: Name of imputed file - **--ofolder**: Path to the output folder - **--it**: Number of iterations to train the model - **--miss**: The percentage of values to be concealed during the evaluation run (from 0 to 1) - **--outall**: Set this argument to 1 if you want to output every metric - **--override**: Set this argument to 1 if you want to delete the previously created files when writing the new output - **--model**: Choose the model to use (None if GenerativeProteomics, otherwise provide name of the pre-trained model) If you want to assess the efficiency of the code you may provide a reference file containing a complete version of the dataset (without missing values): .. code-block:: bash python generativeproteomics.py -i /path/to/file_to_impute.csv --ref /path/to/complete_dataset.csv Running this way will calculate the RMSE of the imputation in relation to the complete dataset.